Adaptive Feature Extraction Method for Degraded Character Recognition

نویسندگان

  • Minoru Mori
  • Minako Sawaki
  • Junji Yamato
چکیده

Most character recognition applications target machine printed and handwritten characters on paper documents. Recently, the recognition of text in videos, web documents, and natural scenes has become an urgent demand; research has intensified because this task is difficult to realize (Antonacopoulos & Hu, 2004; Doermann et al., 2003; Kise & Doermann, 2007; Lienhart & Wernicke, 2002; Lyu et al., 2005; Zhang & Kasturi, 2008). The problems posed by recognizing low quality characters in the above mentioned applications are mainly due to deformation such as the variety of font styles and style effects, as well as image degradation like background noise, blur, and low resolution. A key weakness of most conventional character recognition methods is that they tackle either one problem or the other, not both. For overcoming image degradation, some methods, e.g. (Ho, 1998; Kopec, 1997; Xu & Nagy, 1999), design templates that reflect the degradation type anticipated. Also a robust discriminant function for recognizing degraded characters was proposed in (Sato, 2000; Sawaki & Hagita, 1998). Unfortunately, these methods are sensitive to shape deformation, since they employ image-based template matching. They fail to effectively handle multiple fonts and several style effects. On the other hand, geometric features are often used for recognizing multiple fonts. Stroke direction is particularly effective against character deformation (Umeda, 1996). For example, the direction contribution based on stroke run-length is effective (Akiyama & Hagita, 1990; Srihari et al., 1997; Zhu et al., 1997). However, geometric features are not robust against corruption of information due to image degradation. In addition, although geometric features are more robust against deformation than image-based template matching, they are not invariant for deformation such as aspect ratio fluctuation and stroke position shift. Therefore, geometric features are weak against the kinds of deformation that are not present in the training samples. For overcoming deformation problems mentioned above, nonlinear shape normalized techniques (Tsukumo& Tanaka, 1988; Yamada et al., 1990) have been proposed as a pre-processing method to relocate strokes uniformly. They normalize a pattern by exploiting the distance between strokes (Tsukumo & Tanaka, 1988) and stroke line density (Yamada et al., 1990), and are mainly aimed at the recognition of Kanji characters that consist of many strokes in mostly square patterns. Therefore, applying these methods to the recognition of numerals, alphabets and kana characters, which consist of fewer strokes and are not square shape, is difficult. Also these methods are ineffective for degraded characters with backgrounds noise and blur be3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Grayscale Feature Combination in Recognition based Segmentation for Degraded Text String Recognition

Grayscale feature is very effective for degraded character recognition. While many papers focus on different feature extraction algorithms on single character recognition, few deals with the impact of the selected feature on segmentation. For recognition-based segmentation, a good recognition performance on single character may not always have good performance on segmentation. In this paper, tw...

متن کامل

Robust Feature Extraction Based on Run-Length Compensation for Degraded Handwritten Character Recognition

Conventional features are robust for recognizing either deformed or degraded characters. This paper proposes a feature extraction method that is robust for both of them. Run-length compensation is introduced for extracting approximate directional run-lengths of strokes from degraded handwritten characters. This technique is applied to the conventional feature vector based on directional runleng...

متن کامل

Generalization of Hindi OCR Using Adaptive Segmentation and Font Files

In this chapter, we describe an adaptive Indic OCR system implemented as part of a rapidly retargetable language tool effort and extend work found in [20, 2]. The system includes script identification, character segmentation, training sample creation, and character recognition. For script identification, Hindi words are identified in bilingual or multilingual document images using features of t...

متن کامل

Structural Run Based Feature Vector to Classify Printed Tamil Characters Using Neural Network

Feature Extraction plays most crucial and important role in character recognition. The selection of stable and representative set of features is the main problem in pattern recognition. Because of font characteristics and style variation of machine printed Tamil characters, feature extraction remains a problem. Feature extraction involves reducing the amount of resources required to describe a ...

متن کامل

Prototype Extraction and Adaptive OCR

ÐTo maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010